Skip to content

Conversation

@wangxiyuan
Copy link
Collaborator

@wangxiyuan wangxiyuan commented Nov 24, 2025

Bump vLLM version to v0.11.2

What's broken and changed by vLLM:

  1. structured_output is broken by [Core] Async scheduling + structured outputs compatibility vllm#26866
  2. get_mrope_input_positions is broken by [Model] Pass mm_features directly into get_mrope_input_positions vllm#28399
  3. graph mode is broken by Avoid bytecode hook and simplify TorchCompileWrapperWithCustomDipatch vllm#25110 we'll upgrade torch to 2.8 to fix the problem later
  4. embedding is broken by Rename clashing method names for vLLM model protocol vllm#27583
  5. get_attn_backend_cls and attention backend is broken are broken by [CI Failure] Fix backend selection for encoder-only models vllm#28534
  6. spec decode is broken by [Redo] #26368 vllm#28771
  7. sp feature is broken by [compile] Enable sequence parallelism matching w/o custom ops enabled  vllm#27126
  8. mtp is broken by [AsyncScheduling] Don't schedule past request max_tokens vllm#27922
  9. lora is broken by [Bugfix][LoRA][Spec Decode] Support LoRA with speculative decoding vllm#21068
  10. execute_model is broken by [Core] Async scheduling + structured outputs compatibility vllm#26866
  11. VLLM_DISABLE_SHARED_EXPERTS_STREAM env is broken by [Bug] Fix env string "0" same to True vllm#28159
  12. kv cahe is broken by [Hybrid] Pass kernel block size to builders vllm#27753
  13. dp is broken by Avoid bytecode hook and simplify TorchCompileWrapperWithCustomDipatch vllm#25110

What's broken and changed by ourself:

  1. qwen vl is broken by [Model][MM] Extract conv layer as CustomOp vllm#28455 We'll remove model files in the future to avoid this kind of error
  2. Engine core is broken by [V1] Support MP Executor for multi node distributed inference vllm#23691 We'll remove the patch file in the future.
  3. Ascend scheduler is broken by [Misc] Make SchedulerConfig.max_model_len init-only vllm#28733 We'll remove ascend scheudler later.
  4. qwen3-next is broken by [PERF] Decouple projections from GDN custom op. Attempt 2 vllm#28083 We'll remove model files in the future to avoid this kind of error
  5. qwen vl is broken by [Bugfix][Qwen][Multimodal] Move Qwen2_5_vl sdpa to custom op and reenable compile vllm#27764. We'll remove model files in the future

Known issue:

  1. ray doesn't work
  2. the accuracy of qwen3-next is not correct
  3. qwen3-vl is broken
  4. prefix cache+ ascend scheduler + deepseek v2 lite is broken.

Co-authored-by: MengqingCao [email protected]
Co-authored-by: hfadzxy [email protected]
Co-authored-by: leo-pony [email protected]
Co-authored-by: 22dimensions [email protected]
Co-authored-by: shen-shanshan [email protected]

@github-actions github-actions bot added documentation Improvements or additions to documentation module:tests module:ops module:core labels Nov 24, 2025
@github-actions
Copy link

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request upgrades the vLLM dependency to version v0.11.2 and adapts the codebase to the corresponding upstream changes. The modifications are extensive and touch upon various components, including Dockerfiles, documentation, tests, and core implementation files.

Key changes include:

  • Updating the VLLM_TAG in all Dockerfiles to v0.11.2 and optimizing the git clone process.
  • A significant architectural refactoring in the model runner, which splits the model execution logic into two distinct steps: execute_model for the forward pass and a new sample_tokens method for token sampling and applying grammar constraints. This change is consistently propagated through the worker and model runner implementations.
  • Adapting to API changes in SchedulerOutput by removing grammar-related fields, which are now handled in the new sample_tokens step.
  • Updating the custom attention backend registration to align with vLLM's new decorator-based system.
  • Refactoring model implementations such as Qwen2.5-VL and Qwen3-Next to align with upstream method renames and simplifications, including the removal of custom workarounds that are no longer necessary.
  • Enhancements to the MultiprocExecutor to better support multi-node distributed execution.

The changes are well-integrated and appear to correctly adapt the project to the new vLLM version. I have not identified any critical or high-severity issues in this pull request.

@wangxiyuan wangxiyuan changed the title upgrade to vllm new commit upgrade to vllm 0.11.2 Nov 24, 2025
@zhangxinyuehfad
Copy link
Contributor

@leo-pony
Multi-Node-Ray test failed:
log:

(EngineCore_DP0 pid=300679) (RayWorkerWrapper pid=300872) INFO 11-24 08:50:32 [__init__.py:106] Registered model loader `<class 'vllm_ascend.model_loader.netloader.netloader.ModelNetLoaderElastic'>` with load format `netloader`
(EngineCore_DP0 pid=300679) (RayWorkerWrapper pid=300872) WARNING 11-24 08:50:33 [worker_base.py:301] Missing `shared_worker_lock` argument from executor. This argument is needed for mm_processor_cache_type='shm'.
(EngineCore_DP0 pid=300679) (RayWorkerWrapper pid=300872) INFO 11-24 08:50:33 [utils.py:973] FLASHCOMM2 not enable.
(EngineCore_DP0 pid=300679) ERROR 11-24 08:50:34 [core.py:842] EngineCore failed to start.
(EngineCore_DP0 pid=300679) ERROR 11-24 08:50:34 [core.py:842] Traceback (most recent call last):
(EngineCore_DP0 pid=300679) ERROR 11-24 08:50:34 [core.py:842]   File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 833, in run_engine_core
(EngineCore_DP0 pid=300679) ERROR 11-24 08:50:34 [core.py:842]     engine_core = EngineCoreProc(*args, **kwargs)
(EngineCore_DP0 pid=300679) ERROR 11-24 08:50:34 [core.py:842]                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=300679) ERROR 11-24 08:50:34 [core.py:842]   File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 606, in __init__
(EngineCore_DP0 pid=300679) ERROR 11-24 08:50:34 [core.py:842]     super().__init__(
(EngineCore_DP0 pid=300679) ERROR 11-24 08:50:34 [core.py:842]   File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 102, in __init__
(EngineCore_DP0 pid=300679) ERROR 11-24 08:50:34 [core.py:842]     self.model_executor = executor_class(vllm_config)
(EngineCore_DP0 pid=300679) ERROR 11-24 08:50:34 [core.py:842]                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=300679) ERROR 11-24 08:50:34 [core.py:842]   File "/vllm-workspace/vllm/vllm/v1/executor/abstract.py", line 101, in __init__
(EngineCore_DP0 pid=300679) ERROR 11-24 08:50:34 [core.py:842]     self._init_executor()
(EngineCore_DP0 pid=300679) ERROR 11-24 08:50:34 [core.py:842]   File "/vllm-workspace/vllm/vllm/v1/executor/ray_executor.py", line 97, in _init_executor
(EngineCore_DP0 pid=300679) ERROR 11-24 08:50:34 [core.py:842]     self._init_workers_ray(placement_group)
(EngineCore_DP0 pid=300679) ERROR 11-24 08:50:34 [core.py:842]   File "/vllm-workspace/vllm/vllm/v1/executor/ray_executor.py", line 370, in _init_workers_ray
(EngineCore_DP0 pid=300679) ERROR 11-24 08:50:34 [core.py:842]     self.collective_rpc("init_device")
(EngineCore_DP0 pid=300679) ERROR 11-24 08:50:34 [core.py:842]   File "/vllm-workspace/vllm/vllm/v1/executor/ray_executor.py", line 493, in collective_rpc
(EngineCore_DP0 pid=300679) ERROR 11-24 08:50:34 [core.py:842]     return ray.get(ray_worker_outputs, timeout=timeout)
(EngineCore_DP0 pid=300679) ERROR 11-24 08:50:34 [core.py:842]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=300679) ERROR 11-24 08:50:34 [core.py:842]   File "/usr/local/python3.11.13/lib/python3.11/site-packages/ray/_private/auto_init_hook.py", line 22, in auto_init_wrapper
(EngineCore_DP0 pid=300679) ERROR 11-24 08:50:34 [core.py:842]     return fn(*args, **kwargs)
(EngineCore_DP0 pid=300679) ERROR 11-24 08:50:34 [core.py:842]            ^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=300679) ERROR 11-24 08:50:34 [core.py:842]   File "/usr/local/python3.11.13/lib/python3.11/site-packages/ray/_private/client_mode_hook.py", line 104, in wrapper
(EngineCore_DP0 pid=300679) ERROR 11-24 08:50:34 [core.py:842]     return func(*args, **kwargs)
(EngineCore_DP0 pid=300679) ERROR 11-24 08:50:34 [core.py:842]            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=300679) ERROR 11-24 08:50:34 [core.py:842]   File "/usr/local/python3.11.13/lib/python3.11/site-packages/ray/_private/worker.py", line 2858, in get
(EngineCore_DP0 pid=300679) ERROR 11-24 08:50:34 [core.py:842]     values, debugger_breakpoint = worker.get_objects(object_refs, timeout=timeout)
(EngineCore_DP0 pid=300679) ERROR 11-24 08:50:34 [core.py:842]                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=300679) ERROR 11-24 08:50:34 [core.py:842]   File "/usr/local/python3.11.13/lib/python3.11/site-packages/ray/_private/worker.py", line 958, in get_objects
(EngineCore_DP0 pid=300679) ERROR 11-24 08:50:34 [core.py:842]     raise value.as_instanceof_cause()
(EngineCore_DP0 pid=300679) ERROR 11-24 08:50:34 [core.py:842] ray.exceptions.RayTaskError(AssertionError): ray::RayWorkerWrapper.execute_method() (pid=300878, ip=172.22.0.188, actor_id=ccad69f02f06cafa8981145201000000, repr=<vllm.v1.executor.ray_utils.RayWorkerWrapper object at 0xffcfbc328810>)
(EngineCore_DP0 pid=300679) ERROR 11-24 08:50:34 [core.py:842]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=300679) ERROR 11-24 08:50:34 [core.py:842]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=300679) ERROR 11-24 08:50:34 [core.py:842]   File "/vllm-workspace/vllm/vllm/v1/worker/worker_base.py", line 343, in execute_method
(EngineCore_DP0 pid=300679) ERROR 11-24 08:50:34 [core.py:842]     raise e
(EngineCore_DP0 pid=300679) ERROR 11-24 08:50:34 [core.py:842]   File "/vllm-workspace/vllm/vllm/v1/worker/worker_base.py", line 332, in execute_method
(EngineCore_DP0 pid=300679) ERROR 11-24 08:50:34 [core.py:842]     return run_method(self, method, args, kwargs)
(EngineCore_DP0 pid=300679) ERROR 11-24 08:50:34 [core.py:842]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=300679) ERROR 11-24 08:50:34 [core.py:842]   File "/vllm-workspace/vllm/vllm/v1/serial_utils.py", line 479, in run_method
(EngineCore_DP0 pid=300679) ERROR 11-24 08:50:34 [core.py:842]     return func(*args, **kwargs)
(EngineCore_DP0 pid=300679) ERROR 11-24 08:50:34 [core.py:842]            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=300679) ERROR 11-24 08:50:34 [core.py:842]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=300679) ERROR 11-24 08:50:34 [core.py:842]   File "/vllm-workspace/vllm/vllm/v1/worker/worker_base.py", line 324, in init_device
(EngineCore_DP0 pid=300679) ERROR 11-24 08:50:34 [core.py:842]     self.worker.init_device()  # type: ignore
(EngineCore_DP0 pid=300679) ERROR 11-24 08:50:34 [core.py:842]     ^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=300679) ERROR 11-24 08:50:34 [core.py:842]   File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/worker_v1.py", line 236, in init_device
(EngineCore_DP0 pid=300679) ERROR 11-24 08:50:34 [core.py:842]     self.device = self._init_device()
(EngineCore_DP0 pid=300679) ERROR 11-24 08:50:34 [core.py:842]                   ^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=300679) ERROR 11-24 08:50:34 [core.py:842]   File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/worker_v1.py", line 220, in _init_device
(EngineCore_DP0 pid=300679) ERROR 11-24 08:50:34 [core.py:842]     assert self.parallel_config.local_world_size <= visible_device_count, (
(EngineCore_DP0 pid=300679) ERROR 11-24 08:50:34 [core.py:842]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=300679) ERROR 11-24 08:50:34 [core.py:842] AssertionError: local_world_size (32) must be less than or equal to the number of visible devices (16).

@zhangxinyuehfad

This comment was marked as resolved.


visible_device_count = (torch.npu.device_count()
if torch.npu.is_available() else 0)
assert self.parallel_config.local_world_size <= visible_device_count, (
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ray error

@wangxiyuan wangxiyuan added ready read for review ready-for-test start test by label for PR labels Nov 24, 2025
@github-actions
Copy link

This pull request has conflicts, please resolve those before we can evaluate the pull request.

@wangxiyuan
Copy link
Collaborator Author

@github-actions
Copy link

This pull request has conflicts, please resolve those before we can evaluate the pull request.

@wangxiyuan
Copy link
Collaborator Author

Signed-off-by: hfadzxy <[email protected]>
Copy link
Collaborator

@MengqingCao MengqingCao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's address the known issues in the follow-up prs

@wangxiyuan wangxiyuan merged commit bc69d7c into vllm-project:main Nov 26, 2025
29 of 40 checks passed
Kurumi5210 pushed a commit to lidenghui1110/vllm-ascend that referenced this pull request Nov 26, 2025
Bump vLLM version to v0.11.2

What's broken and changed by vLLM:
1. structured_output is broken by
vllm-project/vllm#26866
2. get_mrope_input_positions is broken by
vllm-project/vllm#28399
3. graph mode is broken by
vllm-project/vllm#25110 we'll upgrade torch to
2.8 to fix the problem later
4. embedding is broken by
vllm-project/vllm#27583
5. `get_attn_backend_cls` and attention backend is broken are broken by
vllm-project/vllm#28534
6. spec decode is broken by
vllm-project/vllm#28771
7. sp feature is broken by
vllm-project/vllm#27126
8. mtp is broken by vllm-project/vllm#27922
9. lora is broken by vllm-project/vllm#21068
10. execute_model is broken by
vllm-project/vllm#26866
11. `VLLM_DISABLE_SHARED_EXPERTS_STREAM` env is broken by
vllm-project/vllm#28159
12. kv cahe is broken by vllm-project/vllm#27753
13. dp is broken by vllm-project/vllm#25110

What's broken and changed by ourself:
1. qwen vl is broken by vllm-project/vllm#28455
We'll remove model files in the future to avoid this kind of error
2. Engine core is broken by
vllm-project/vllm#23691 We'll remove the patch
file in the future.
3. Ascend scheduler is broken by
vllm-project/vllm#28733 We'll remove ascend
scheudler later.
4. qwen3-next is broken by
vllm-project/vllm#28083 We'll remove model files
in the future to avoid this kind of error
5. qwen vl is broken by vllm-project/vllm#27764.
We'll remove model files in the future

Known issue:
1. ray doesn't work
2. the accuracy of qwen3-next is not correct
3. qwen3-vl is broken
4. prefix cache+ ascend scheduler + deepseek v2 lite is broken.

Co-authored-by: MengqingCao <[email protected]>
Co-authored-by: hfadzxy <[email protected]>
Co-authored-by: leo-pony <[email protected]>
Co-authored-by: 22dimensions <[email protected]>
Co-authored-by: shen-shanshan <[email protected]>

- vLLM version: v0.11.2

---------

Signed-off-by: wangxiyuan <[email protected]>
Signed-off-by: MengqingCao <[email protected]>
Signed-off-by: hfadzxy <[email protected]>
Signed-off-by: leo-pony <[email protected]>
Co-authored-by: MengqingCao <[email protected]>
Co-authored-by: hfadzxy <[email protected]>
Co-authored-by: leo-pony <[email protected]>
Signed-off-by: Kurumi5210 <[email protected]>
845473182 pushed a commit to 845473182/vllm-ascend that referenced this pull request Nov 29, 2025
Bump vLLM version to v0.11.2

What's broken and changed by vLLM:
1. structured_output is broken by
vllm-project/vllm#26866
2. get_mrope_input_positions is broken by
vllm-project/vllm#28399
3. graph mode is broken by
vllm-project/vllm#25110 we'll upgrade torch to
2.8 to fix the problem later
4. embedding is broken by
vllm-project/vllm#27583
5. `get_attn_backend_cls` and attention backend is broken are broken by
vllm-project/vllm#28534
6. spec decode is broken by
vllm-project/vllm#28771
7. sp feature is broken by
vllm-project/vllm#27126
8. mtp is broken by vllm-project/vllm#27922
9. lora is broken by vllm-project/vllm#21068
10. execute_model is broken by
vllm-project/vllm#26866
11. `VLLM_DISABLE_SHARED_EXPERTS_STREAM` env is broken by
vllm-project/vllm#28159
12. kv cahe is broken by vllm-project/vllm#27753
13. dp is broken by vllm-project/vllm#25110

 
What's broken and changed by ourself:
1. qwen vl is broken by vllm-project/vllm#28455
We'll remove model files in the future to avoid this kind of error
2. Engine core is broken by
vllm-project/vllm#23691 We'll remove the patch
file in the future.
3. Ascend scheduler is broken by
vllm-project/vllm#28733 We'll remove ascend
scheudler later.
4. qwen3-next is broken by
vllm-project/vllm#28083 We'll remove model files
in the future to avoid this kind of error
5. qwen vl is broken by vllm-project/vllm#27764.
We'll remove model files in the future

Known issue:
1. ray doesn't work 
2. the accuracy of qwen3-next is not correct
3. qwen3-vl is broken
4. prefix cache+ ascend scheduler + deepseek v2 lite is broken.

Co-authored-by: MengqingCao <[email protected]>
Co-authored-by: hfadzxy <[email protected]>
Co-authored-by: leo-pony <[email protected]>
Co-authored-by: 22dimensions <[email protected]>
Co-authored-by: shen-shanshan <[email protected]>


- vLLM version: v0.11.2

---------

Signed-off-by: wangxiyuan <[email protected]>
Signed-off-by: MengqingCao <[email protected]>
Signed-off-by: hfadzxy <[email protected]>
Signed-off-by: leo-pony <[email protected]>
Co-authored-by: MengqingCao <[email protected]>
Co-authored-by: hfadzxy <[email protected]>
Co-authored-by: leo-pony <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation module:core module:ops module:tests ready read for review ready-for-test start test by label for PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants